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1 . Introduction 

Systems of nonlinear equations can seldom be solved exactly. Usmlly, 
one must obtain approximations to the solutions of such systems by Iteration. 
Quasl-Nev/ton methods (also Imoivn as variable metric, variances, secant, update, 
or modification methods) constitute a class of Iterative procedures which may 
be regarded as generalizations of the secant method for* solving a simple 
equation in one unknown. Indeed, not only is the quasl-Nevrton equation (the 
equation characteristically satisfied by the Iterates produced by these methods) 
a direct extension of the equation which defines the Iterates of the secanr. 
method, but also these procedures share many ol’ the computational advantages 
of the secant method over Newton's method. 

Quasi-Newton methods were first introduced in the papers of Davidon [ 2 ], 
Fletcher and Powell C^ 3 , and Broyden ClI. In spite of their recent origins, 
these methods have proved themselves in dealing with practical problems and 
have become the subject of a large amount of research. The paper of Dennis 
and Moi'e‘[ 3 ] provides both an excellent in-depth survey and an elegant unified 
development of quasi-Newton methods and their theory as understood in the mld- 
1970's. The main body of this note is a rearrangement and condensation of 
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material in tSL 

In ths follov;ln£ 5 , v;a first formulate procloely the problem to be solved 
and motivate the introduction of quasi-Nev/bon methodo by considering the 
classical Nev;ton and secant methods and their properties. Vfe then survey 
three hlf^ily oucceoafUl quasi-Kevrtion methods: Broyden’s method for the 
solution of general nonlinear equations, and the Davldon-Metcher-Pov/ell 
and Broyden-Pletcher-Goldfarb-Slianno procedures for unconstrained minimization. 
(The last tvro methods v/111 henceforth be referred to as the DPP and BTOS methods, 
respectively.) Finally, v;e compare the properties of these methods to those of 
Nev/ton'o method and UIMLE in potential applications to maximum-likelihood esti- 
mation of parameters in mixture distributions. 

2. The problem 

We consider the problem of solving F(x) = 0 in a;i open convex subset 
D of f/' under the follo\d.ng assumptions on the mapping F:D -»■ : 

(a) F Is continuously differentiable on D, 

(b) There is an x« In D such that F(x'0 =* 0 and 
P*(x5i) is nonsingular. 

Newton's method for Iteratively approximating the solution x^ begins with 
an Initial approximation Xq to x* and attempts to obtain Improved approxi- 
mations by the iteration 

^k+1 = - P' (Xj^)"^F(xj^) k = 0,1, ... . 

The convergence properties of Newton's method which are important here are 
summarized in the following theorem. 
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*nigorein : VJhcnevor jJq ic aulTiclontly near there ia a sequence 
{a,,} of nen*=»necativo nimibers which converges to zero onci for which 

(1) “ JC»| S OtjJXj^ - x«| Ic « 0,1, ... 

If, In addition to oatiofying ascuniptlons (a) and (b) above, P has a derivative 
wlilch Is Llpschltz continuous at x*, l.e,, there exists a k for vMch 
IP’(x) - P'(x*)( £ k|x " x*| for all x sufficiently near x*, then there 
exists a constant 3 ouch that 

(2) - x*| s eiXj^ - k => 0,1, ... 

whenever Xq Is sufficiently near x^. 

A sequence vMch satisfies an inequality of the form (1) vri.th a sequence 
{ttj^}j ^0 1 which converges to zero Is said to converge superlinearly . If 
a sequence satisfies an inequality of the form (2), then It is said to converge 
quadratlcally . Superllnear convergence is fast; quadratic convergence is very 
fast. Since lipschitz continuity is a very weaJc assunption, one might say that 
the theorem asserts that the convergence exhibited by the Newton iterates is 
always fast and almost always very fast. 

The rapid convergence of the Newton Iterates is the major advantage of 
Newton's method. Another advantage is that Newton's method is "self-corrective" 
in the sense that depends only on P and x^^ so that bad effects of 

previous iterations are not carried along. (Quasi-Newton methods are not self- 
corrective in this sense. ) Balanced against these advantages is the fact that 
Nevjton's method often requli’es a great deal of conputatlon at each iteration. 
Indeed, the determination of each iterate requires 0(n ) function evaluations 



and Q(n^) aritlmstic oporationo. Iliuo one Id led to anlc whether there 
are methodo wMch retain faot convergenee while rcqulrlnfs fev/er fijnctlon 
evaluationc and aritlmetlc operationn at each iteration. 

VJlth this question in mind, consider the secant method in the case 
n 1. This method begins v;lth an initial approximation Xq to x* and 
defines successive approximations by the iteration 


^k+1 ° " 


” ^k-1 




P(x,p . 


One may regard the secant method as being obtained from Neurton's method by 
replacing the derivative P' (Xj^) by a finite-difference approximation. A 
particular consequence is that the number of function evaluations per iteration 
is reduced from two for Newton’s method to one for the secant method vMle the 
number of arithmetic operations per iteration is not slf?il f leant ly increased. 

It can be proved that, for Xq sufficiently near x*, the iterates produced 
by the secant method exhibit superllnear convergence rather than quadratic 
convergence as In the case of the Newton Iterates. Nevertheless, superllnear 
convergence is still fast, and experience has shown that, as a general-purpose 
algorithm, the secant method is more efficient in total computation time than 
Newton’s method. This suggests that generalizations of the secant method to 
higher dimensions might be similarly successful. 


3. Quasi-Newton methods 

Quasi-Newton methods are generalizations of the secant method which are 
applicable to problems of the type at hand involving an arbitrary number of 
independent variables. The key properties of these methods are that the 
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IterafcGO ojtlilblt Duperlinoor local convor^^GncG and that each Iteration 
requiroo n flmction evaluations and 0(n ) arlthmatie operations. In 
spite of the fact that quasi-Noivton methods do not have the quadratic conver- 
gence property of Kev;ton*s method, the coirparatlvely small number of ilmctlon 
evaluations and arithmetic operations malce them preferable to Kovrtjon’s method 
in many applications. 

Quasi-Nevrt;on methods liave the general form 



where satisfies the quasl-Nevrt;on equation 

(3) = P(x„) - P(Xu_i) . 

Note that has the action of a finite-difference approximation to 

in the direction (xj^ - llius quasi-Newton methods in general 

bear the same relation to Nevrt:on*s method as the secant method in the case 
n = 1. 

It is clear that the secant method is a quasi-Newton method. In fact, 
if n = 1, then the quasl-Nevrton equation determines the scalar Bj^ exactly, 
and so the secant methoi Is the only quasi-Newton method in this case. If 
n > 1, then the quasi-Nevrt:on equation alone does not determine Bj^ uniquely; 
hence, there is no unique natural extension of the secant method to the case 
of an arbitrary number of Independent variables. This lack of uniqueness In 
the general case may be regarded as an advantage, for it alloivs a variety of 
quasi-Newton algorithms which may be drawn upon to take advantage of any 
special structure vrtiich may be present in specific problems of interest. 
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I’Jhon n > 1, ono must Impoco rolationn befcv-'Gon cuucoooive matricoo 
and thalr predccGcooro which, togefchar v/lth the quacl-Nov/ton equation, 
uniquely dotomlno thono rmtricec Inductively. In general, those relations 
are chosen v;ith an eye tovrard minimising the ecmputational complexity of the 
resulting update formula for determining from and P while 

tailing ma^iimal advantage of whatever cpeclal atructure may be shared by the 
particular problems under consideration. Of the three quasi-Nevrton methods 
presented below, the first (Broyden's method) is Intended to be a general 
purpose algorithm which can be applied to all problems v/ithout r-egard to 
special structure. Consequently, In Broyden’s method, is obtained by 

adding a ranlc-one “correction term” to in such a vjay that the quasi- 
KevJton equation is satisfied and agrees vd.th on the orthogonal 

complement of - Xj^). In a sense, this be regarded as the “simplest" 

vsy to obtain from in such a way that the quasl-Ne;'Jton equation is 

satisfied. On the other hand, the second tv;o methods (the DFP and Bt^yiS methods) 
are designed for unconstrained minimization problems, in which the Jacobian 
P’(x) can be expected to be symmetric and positive-definite, ihus the update 
fomulas for these methods are such that the successive Bj^'s “inherit" 
syr.metry and positive-definiteness fran the preceding ones. Not surprisingly, 
these foimilas are more coiplex than the update formula of Broyden's method. 

In fact, in order to guai’anfcee hereditary synmetry and positive-definiteness, 
it is necessaiy in these formulas to determine from B^^ with a 

t 

correction term of ranlc two. 
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• Broydcn^p mofchod for F^onoral nonllnoe i jr oguatleno 

Broydon'o mothod io, in a Gonoo, tho “alinpleGt" of the moot popular 
quani^Novjbon mothodo ond Id intended to bo a 0Gnoral*=purpoDO alf;sorltto for 
Colvins orbltrapy nonllne.'ir equatlonc. To dorivo the formula uccd Ir. tiroyden'a 
mothoi to update the matrlcco suppoce thatj for corns Ic 5 0, one hac 
arrived at and Tnon can be generated by the formula 

^lc+1 “ \ " * 

Our objective io to ugo and P to update In the 

"ciinpleDt" v/ay to obtain a matrix which Gaticflee the quaei-Nevrt:on 

eqmtlon. 

For convenience, v;e adopt the follovring notation: 


X 


Ic 


X. \ = B, = X.+, -X,. = 




In this notation, the quasl-NevrtJon equation which vje wish to satlsfV 

is Bs = y. Ihls equation uniquely specifies the actiori of B In the 
direction of s. Since there is no apparent reason for B to differ ft’om 
B on the orthogonal conplement of s, it seems reasonable to impose on B 
the condition that Ba = Bz for all z such that z"^s = 0. It is easily 
verified that there is a unique B which satisfies both this condition and 
the quasi-Nevrt:on equation, lliis B is given by the fonnula 


B . B t 


T 


Note that B and B differ by a rank-one operator. Restoring subscripts, 
v;e obtain the iteration formulas for Broyden's method: 
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* 11+1 ° h- - 


T 


®te+l ° **lt ■*■ 


|Oi,l 
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(tore jr,j = P(x,j^j^) - P(x,.) ond o,j = - x,;. 

Doso Broydon’o method Gidnlblt the key propcrtilea attributed to qunDl- 

Kevjton methods In the preceding cectlon? It can be chov;n that If Xq and 

Bg are cufflciently near x^ and P'(x**)> reopectively, than the Broyden 

Iterates are v;oll-definGd and converge ouperllnGarly to (Ihe proof la 

very Involved, and we omit it.) Alco, It io clear that, for a given value of 

k, the determination of and requires only the n function 

evaluations necessary to specify aosuirdng that P(Xj^) can be 

provided from storage. Finally, it is evident that, for* a given k, 

and can be determined with 0(n ) arithmetic operations if 

2 

can be ev«^uated vjith 0(n ) arithmetic operations. 

1 2 

Ihere are tv;o vjays of evaluating ^ arithmetic 

operations, both of vjhich require information about The first vray is 

*1 “1 

based on the Sherman-Moirlson formula C83 and produces from B vdth 

o 

0(n ) arithmetic operations in the follovri.ng vray: vn’lto 


where u = (y - Bs), 


B 

V 


= B + <y — g 2) Z 
isr 



then 


T 

= B + uv 
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1 + <v,B”\i> 


-1 T -1 



Tao cGeenti v.'ay Id baccd en a fipeaial faotorlsatlori pi’oot'tjui^o duo to G . ii 
and KuFi^ay E53 vMeh bn^inn v;ith a factorisation n o arid ylr>ldo a 
factorisation B ° Q H vjith ©(n^*) orlttootlo oporatlorio. (!fcro, Q and 
Q aro orthogonal and R rmtl H aro uppor-trlm\^ilQr. ) Rlnco on n-dlmanoional 
linear’ oyotan whoco Goafflciont niatrl:: lo factored In this v;ay orm bo solved vjlth 
O(n’) as’ltlimetlG oporationo, thin allowo tho evaluation of the tormn I'’(55j^) 

O 

vjith 0(n‘’) arlttoGtic operationn an dooired. For rcanono ol* numerical ntabllity, 
tho Gill-Murray faetorlaatlon procedure in conei’aHy preferable to tho msthod 
uolnfi the Shorman-Morrlnon for^nula. 

5 • Tho DFP and BFO S mothod o for uneonntrained ml rd mjg atlon 

For tho purponoo of thin note, tho baolc problem of unconotralned mlnlmlKatlon 

may bo rocardod an the problem of oolvlnf; '?f(30 =^0 in an open convex cubset D 

of whore f in a nonlinear functional from D to R^. Clearly, thin 

problem in of tho typo introduced In Section 2, vdth 7f playiru’; tho role of P. 

Iho npeclal feature of this problem in that the Jacobian of tlio function v;hor>e 

2 

zero Is being sought is actually the Hessian V f , a rmitrix v;Mch is certainly 

p 

symmetric. In fact, in most problems of practical interest, Vf is positive- 
definite near the ralnlmiim of f . 

It seems reasonable to require that the matrices appearing In a quasi- 
Nevrt:on method applied to an unconstrained minimization problem be symmetric and 
positive-definite. Since each is to be determined from Its predecessor 
by an up<^te formula, it Is reasonable to impose conditions on the update formula 
vMch guarantee that symmetry and positive-definiteness are Inherited by the 
successive matrices Unfortunately, Imposing hereditary symmetry as well as 

the quasi-Nevrt:on equation completely determines a ranlt-one update formula , and 
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felilc ferskUla dooo not horedltary poclt.lvo-tJ'^l’initcrrjoo, Oonocfiuonfclyi, 

ono io led to loo’t fop railj'^two updato foiTiruloo niiioJi innuro that tho cuqcoooIvo 
nntrieon Inherit opmotry and pooitlvc'^ofinltcnooo. 

A cenoral ranlt°tv;o update? forrojla wMch j^iorantcoo hoifjdltary DjfiMofcry 
Id tho follov;in^: 


iJ c B + Cv ° 60)6*^* o(,v ° Rd)*^' 

<G,D> 


< y «=» B o 

cCjtp' 



» 


whoro c Id ony vector in rP oueh ttot >ie,D> 0. A ’'natural" ehoiqo of 
e vMch InDUTGD horeditary poDitivo*=dGflnitcnaDD whonovor <y,o> > 0 Id 
G y. (Slnco <y,n> s <sV f(x^^)o,D> near one CKpeetD ':/,d> to be 

poDltive near ) 'Jhe renultinr^ update formula Id that uD«i In the 
Daviclon->PlGtchor“Fov;oll (DPP) method. DGnotir\f^, by . the ujxlated mati’lx 
obtained from B by applying thio formula, one hao 


hiF? 


CJ y + 


T 

(V Bp)y 4 - y(y 
<y,D;> 


Bd) 


T 


<y 


T 

Bd.d>,V,V 


<Vi 


- er - -fVp \n/ T „ »>.v \ 


,T 


1 .V 


T 


yy 


.T 


<y 




<y,o>' 


<y,a> 


Ad v;lth Broyden's method, one can ohovj that the DFP iteratoD converge 

Duperllnearly to whenever Xq and Bq are Dufflcient-.ly near x* and 
2 

V f(x^>), respectively, and that each iteration requires n function 

p 

evaluations and 0(n ) arithmetic oporatlono. AlthougJh the DPP update 
fomula is a bit more complicated than the Broyden update fonnula, experience 
has shovfli that the DPP method is generally superior to Broyden’s method for 
problems In unconstrained minimization. 


ll 


Afc tli? ifcoratiQM, bDfch Brayden'D mofch . ’ jjKi thG DBF motted 
require fM’efc tho dotciiiila'iblon of crid then fcbo upfitlas of 

Ifc £c rntu(‘hl to ODl: whetiior a moro offiolont method mlf^ht bo (jbi.alncd by 
opplylns mi update fonnula directly to . If t;o denote rf^ by II 
end 1 b“^ by H, tho quaai“KQi-;ton f>quat;ion Flo ^ y boot moo o I?y, 
Carrying out a dovolopnent CGni3)lotoly analo'^oua to tliat loadin" to the DI-'P 


update foiwila yloldc tho update foiTOila of the Broy(Jcn-Pletchor-=^5Irirmo« 
Goldfarb (BFflS) mothod. Denotla'^ by tho updateri "’ ’trlx obtained from 

H by applying* thio formula, one Iiao 




BR3S 


(I 


„ T 


T T 

J^) + _22_ 
<>'y,o> <y,o^ 


It lo not difficult to oeo that, as 3 ji the eaco of tho update, this 
update adds a ranlc-two correction tom to H and {gjt'ai’antees heredltcay syiwetry 
and, if <y,a> > 0, posltlve-dcfinlteneoc. Ae;ain, It can be shorn timt the 
BPQS iterates converce cuperlinearly to wherever Xq and Hq ai-e 

sufficiently near and y‘'V(x^’)’^, respectively. It lo clear that each 
iteration requires n function evaluations and 0(n ) aritlmetlc operations. 

The BPGS method lo not the same as the DFP method. In fact, 


^BPGS ° ^ 

where v ■=> • Accordins to [33, there Is "growing 

evidence that 3FQS is the best current update formula for use in unconstrained 
ndnimization" . 
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6. A potential application 

VJe conclude thlo note by comparlnj; the properfciec of quaol-Kevrt;on methode 
to thoae of Kev;ton'o method and mfCE in a potential application to the 
problem of obtaining maximum-likelihood eBtimateo of the parameters in mixture 
distributions. Such estimates, of course, play a fundfanental role in certain 
approaches to signature extension, estimation of proportions, and clustering. 
For a description of the UHMIii: algorithm, see C6] and [7 3. 

let X be an n-dlmensional random variable vd.th probability density 
function 

P(x) = JjoJ p^(x) , 


where 




0 T n-1 n 
-l/2(x-pp q (x-up 


and the proportions are pooltlvfj and sum to 1, Suppose that M 

is a sample of independent observations on X. By a maxiJiMn-likellhood estimate 
of the parameters {a^, u?, » > we mean a choice of paramecers 

{a^ , p., , _ which locally nexljnizes the log-Ilkellhood function 

A X X X""i J • • • ™ 

N 

L = iJi log P(X|^) , 


regarded as a Ibnction of the parameters . It Is known 

that, loosely speaking, there Is a unique strongly-consistent maximum-likelihood 
estiTiate. ' (See [7] for a clarification and proof of this statement.) 

The problem which we consider here is to approximate numerically the 
strongly-conslstent maximum-likelihood estimate. Tiiis is potentially a very 
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difllcult problem. Indeed, the number of Independent variablec in 
(m-l)+mn + m — , a number which may be very iJirge, I'lirthorraore, 
the evaluation of functiono derived from the log-likelihood f'jnctlon usually 
involVGB Guirmatlon over the entire nanrole of N oboervations and, hence, is 
a source of computational difficulty when the sample is large. In the table 
below, we list the key properties of UIMLE, Nevrtjon's methiod, and quasi- 
Nevrton methods when applied to solving likelihood equations obtained by 
differentiating the log-likelihood function. It should be noted that, in 
addition to the arithmetic opei’ations listed in the table, each method requires 
at each iteration the evaluation of the Ametions p^(xj^), 1 = l,...,m, 
k ^ 1 , . . . ,N . 


ME7TH0D 

CONVERGENCE 

ARITHMETIC OPERATIONS 
PER ITERATION 

UHMLE 

Linear 

O(mn^N) 

Newton's Method 

(Quadratic 

Oj^(m^n^N) + O^Cm^n^) 

Quasi-Newton Methods 

Super linear 

O^(mn^N) + OpCm^n^) 


Of course, many factors must be considered in addition to convergence 
rates and the amount of arlthnetic per iteration when deciding what sort of 
algorltto is best suited in a particular instance for application to the 
problem under consideration. For example, UHMLE is a type of gradient 
method; hence, one might expect UHMLE to enjoy the relatively good global 
convergence behavior usually associated with gradient methods. Furthermore, 
gradient methods are often competitive in speed of convergence to Newton's 
method and quasi-Newton methods when only "ball-park" approximations to the 





Golution arc dcGlred. Since the neameco of the maxlmum-llkollhood ectlmate 
to the true parameterG vr*il be limited by the var'iance of the oaniplo obcer- 
vatlonoj "ball-park” approxlJiiat Ions vri.ll certainly cuff ice except, perhaps, 
in the ease of a very lai'ee sample. 

It is difficult to predict circumstances In wMch the advantage of fast 
convergence for Kevrtion's method and quasi-Newton methods vrill outv;eigh the 
disadvantage of having to perform a great many arithmetic operations at each 
Iteration vrith these methods. However, it should be noted that if N is 
very large relative to m and n, then the number of arithimetic operations 
per iteration required by quasl-Nevrton methods is comparable to the number 
required by UHMLE. Also, if N is very large, one might reasonably vjant 
to obtain very accurate approximations of the maximum-likeliyiooi estimate, 
in which case the superlinear convergence of quasi-Newton methods is clearly 
preferable to the linear convergence of UHMLE. Consequently, if N is very 
large relative to m and n and if particularly accurate approximations of 
the maximum-likelihood estimate are desired, then quasi-Newton methods appear 
to have a clear-cut advantage over UIMLE. In such circumstances, one might 
retain the good global properties of UHMLE by employing a hybrid methoc'. 
which initially behaves lil<e UHMLE and then behaves increasingly like a 
quasi-Newton method as the iteration proceeds. 
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